technovangelist / scripts / the ollama course - finding models

the ollama course - finding models

Welcome back to the Ollama course. This is a free course available here on this YouTube channel that will teach you everything that you need to know about working with Ollama to help make you a Ollama Pro. In the videos so far we looked at an overview of how to work with Ollama and then a look at how to install Ollama on Windows and on Linux and on Mac. In this video we’re going to take a look at how to find the right model for you depending on what kinds of things you want to do.

There are lots of people and pages on the internet that will show you one model is better than all the other models. But I don’t think it’s possible to say that any one model is better than any other model. There are plenty of benchmarks and leaderboards that will claim this, but when you start playing with it using questions that are more relevant to you, you’re probably going to find that the model listed as the best isn’t always the best. That said, you should definitely try all the new models. Assuming that it’s easy enough to find and get those models.

Some people have much slower connections than others to the internet. If you do have a slow connection and you want to experiment with a new model that is huge, you might want to consider setting up a remote host on one of the cloud providers that are going to have much faster internet connections. And then you can download the model, play around with it on that platform, and when you find the right model for you, then and only then download the model to your own local machine.

But in this video I want to focus on how to find the different models that are available.We saw a little bit about this in the first video. I briefly looked at the list of models on Ollama.com. And in this one I just want to go into that with a little bit more detail.

So let’s navigate to Ollama.com and then at the top click on models. When you first visit this page, it’s sorted by featured. Featured just means the Ollama team has picked a few models that it thinks are the most interesting and the most popular models today and put those right up at the top. Llama 3.1 at the time of this recording is one of those models. And so it’s shown at the very top of the list. When Llama 2 came out, that was at the top of the list, even though there were newer models for quite a while.

So let’s take a look at this list and what we can do here. First off at the top is a search box. It’s not really a search, but rather filter. So we can enter some text and we will find all the models that have that text in the name of the model. The models that come up are just the official models that are part of the core library of models. And those are our models submitted by the Ollama team.

There’s also a search box at the top right and that’s for searching all the models on Ollama.com including the official models as well as models submitted by other people.

Down in the list we see each model has the same types of information shown for each model. First, there’s the title. So in my case right now it’s “Llama 3.1”. Below that is a short description of what’s special about this model. Below that are the main categories that this model fits into. So we see “Tools”. So what “Tools” means is that it supports the newer method for calling tools. That doesn’t mean that models that don’t have tools don’t support calling tools. There’s another method you can use that will let you use tool calling from any model within Ollama and in fact that other way tends to be a bit more reliable for now than the newer way. Next to that is “8B, 70B, and 405B”. Now that refers to the parameter sizes for this model.

I talked a little bit about this before. The B in 8B, 70B and 405B refers to billions of parameters. And a parameter is the weights and biases of the model. So a model is made up of many many many nodes that represent concepts that the model understands. And the weights are connections between the different nodes. And weights have a 32 bit floating point number associated with them. That number just says how close the different nodes are to each other. And one node or two nodes might have multiple connections depending on the context of that weight. And when you add all the weights together, that’s the number of parameters that define this particular model. And so the 8B model has 8 billion parameters or 8 billion weights and biases within the model.

Below that we see how many times this model has been pulled from Ollama.com. At the time of recording, llama 3.1 has been pulled nearly 5 hundred thousand times. Next to that is how many tags are associated with this model. We’re going to take a look at tags in a little bit. But tags are basically the different types or different configurations of this model that are available. And sometimes one tag could be an alias to many other tags. But again, we’ll see that in a little bit. Next to that is how long ago this model was last updated. So we see at the time of this recording, llama 3.1 was updated five days ago. And then that pattern is repeated for each of the models listed in this list.

So let’s click on llama 3.1 to understand what additional information shows up in the more details page. In the details page we see a lot of the same information right up at the top. Then way down at the bottom we see more details about this particular model. The details section is going to be different for every model. I would love it if there was a standard format for this so that you could always rely on finding the same information in every page but we don’t get that. Sometimes we get selective benchmarks that the model creator likes to show. They’re rarely from the same benchmark and they’re always pretty selective in the information that gets shown to make it seem that this model is always the best model. A lot of what’s listed down here towards the bottom I don’t find to be very useful.

Let’s go back up to the top. Most of this we saw in the main list, but there is a drop down showing the most popular tags. and then on the right, for which ever tag is selected, we have the command to run to pull that model and run it. You can copy the command to paste into your own terminal. And then this other icon is something I have from an extension called Page Assist. I’ll talk more about that in a future video.

The table below that shows the layers of the model. A large part of the team has a strong connection to Docker and so the Ollama model structure is strongly influenced by the docker image and container structure. At the core of Docker containers is the idea of a layer. You might have a bottom layer that defines the core OS functionality, and then another layer that has nodejs installed. When you get a different container for python, it might use the same core os layer as the node container, so it doesn’t need to redownload that layer. Instead it just downloads the layers that are different from what is in the local cache.

Ollama does something similar. And it needs multiple layers to handle different parts of a model. Most tools out there only let you download one part of the model and the other parts can be confusing to setup. To use a model, you need the model weights file, and you also need a template, and probably a system prompt. Different models use different templates and react differently to different system prompts. But most tools leave it up to you to figure this out.

We can see the layers of the model in this table. the first layer is the model. Then there is a layer for parameters. Meta requires the license agreement to be supplied with the model, so the license layer has that agreement. Finally there is a template. some models might also have a system prompt, or an adapter, or some other layer types. We can click on any of these to see the full layer contents, or a description of the contents.

So click on template and we see a pretty big chunk of text that defines how to format the prompt depending on how you are using the model. Here is the template for gemma2 which doesn’t support the new version of tool use so you can see how simple some templates can be. And here is the template for qwen2 so you can see how different templates can be depending on the model.

Going back and clicking on parameters shows us the parameters defined for this model. In this case its just a list of words and symbols to look for to help indicate that content should be ignored after seeing these. You might also see a temperature here, or top p or top k, or others.

Finally lets looks at the model layer. This has a lot of info about the configuration of the model. For instance we can see it uses the llama architecture. Many of the important values that can be helpful to look at start with the architecture type. So the max context size this model supports is going to be llama.context_length. And this model is set to 128k.

An important note with context length is that all models in Ollama, with a few exceptions, are set to use a context length of 2k or 2048 tokens. To see if a model you are looking at has a different default context length, just look for num_ctx in the parameters layer. To change this, you would define a new modelfile with the parameter num_ctx set to 131072 which is 128k. But be careful. If you don’t have a lot of extra memory, you could crash your machine and have to restart.

ok, so go back to the main page for llama3.1 and then click on the tags link. this shows us all the tags that are available for this model. The one at the top, latest, is rather an unfortunate name, because it has nothing to do with being the latest. When I was still in the Ollama core maintainer team, I tried to get rid of the ‘latest’ tag, but never won that battle. Latest, simply means the most popular tag. This is usually going to be the smallest number of parameters, quantized to 4bits, since 4bits offers the fastest performance with minimal loss of quality from the original 32bit floating point. You’ll see below each tag name there is a hash. Some tags have the same hash value, indicating they are aliases to each other. If you click on any of the tag names, you will see the information for that tag in the model description page. We can also click the copy button to copy the run command for that tag.

That’s all the info listed for each model.

Now how do you decide which is the best model for you? Well, unfortunately there is no good way to tell. Lots of folks on the discord will just regurgitate the info in the leaderboards which isn’t all that helpful. The best you can do is to come up with your own list of questions that are representative of the types of questions you are likely to ask. Then ask that question in 10 separate sessions with ollama. If you ask just once, there is a good chance it will be better or worse than normal. You have to ask a few times to see how the model tends to answer. And then you start to get an idea of which models work best for you. I hope we finally get a better solution to this sometime in the future.

For now, there is no easy fix to finding the best model for you.

But I hope this helps you with finding new models to try. Be sure to subscribe to find out when new videos in this free course are posted. There will be another posted very soon. Thanks so much for watching. Goodbye.